Goto

Collaborating Authors

 pretext task


CroCo: Self-Supervised Pre-training for 3DVision Tasks by Cross-View Completion

Neural Information Processing Systems

Masked Image Modeling (MIM) has recently been established as a potent pretraining paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g.


Useful Facts

Neural Information Processing Systems

A.1 Relation of Inverse Covariance Matrix and Partial Correlation For a covariance matrix of joint distribution for variables X,Y, the covariance matrix is The derivation comes from the following: Lemma A.1 (Conditional independence (Adapted from [34])). Notice for arbitrary function f, E[f(X)|Y] = EL[f(X)|ฯ†y(Y)] with one-hot encoding of discrete variable Y. Therefore for any feature map we can also get that conditional independence ensures: This thus finishes the proof for Lemma D.4. A.3 Technical Facts for Matrix Concentration We include this covariance concentration result that is adapted from Claim A.2 in [18]: Claim A.2 (covariance concentration for gaussian variables). Let X = [x1,x2, xn]> Rn d where each xi N(0,ฮฃX). Then for any given matrix B Rd m that is of rank kand is independent of X, with probability at least 1 ฮด10 over X we have 0.9B>ฮฃXB 1 n B>X>XB 1.1B>ฮฃXB. Let X = [x1,x2, xn]> Rn d where each xi is ฯ2-sub-gaussian. Then for any given matrix B Rd m that is of rank kand is independent of X, with probability at least 1 ฮด10 over X we have 0.9B>ฮฃXB 1 n B>X>XB 1.1B>ฮฃXB. Let Z Rn k be a matrix with row vectors sampled from i.i.d Gaussian distribution N(0,ฮฃZ). Let P Rn n be a fixed projection onto a space of dimension d.




DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions

Neural Information Processing Systems

To answer this question, we begin by revisiting the forward procedure of ViTs. A sequence of positional embeddings (PEs) [51] is added to patch embeddings to preserve position information. Intuitively, simply discarding these PEs and requesting the model to reconstruct the position for each patch naturally becomes a qualified location-aware pretext task.


Representation Learning via Consistent Assignment of Views over Random Partitions

Neural Information Processing Systems

CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes that regularizes the model and enforces consistency between views' assignments.




SupplementaryMaterialsVIME: ExtendingtheSuccessofSelf-and Semi-supervisedLearningtoTabularDomain

Neural Information Processing Systems

Semisupervised learning uses the trained encoder in learning a predictive model on both labeled and unlabeleddata. Figure 3: The proposed data corruption procedure. Original feature matrix(X) consists of four samples xi,i = 1...,4, where each row/column represents a sample/feature, and the features in each sample are represented by the same color. In the experiment section of the main manuscript, we evaluate VIME and its benchmarks on 11 datasets(6genomics,2clinical,and3publicdatasets). The selected SNPs and the corresponding blood cell trait together form an independent labeled dataset.